Lego brick colours

The following is an investigation into lego brick colours. The idea was inspired by Hanna Yan Han’s article here. The data for this is from rebrickable. From which I created a local MySQL DB to access this data (DB set up notes are here).

What I did

  • I explored a Lego database that I previously created (here)
  • Initial focus was an exploration into Lego brick colours.
  • This led to several questions being raised regarding colours and trends that could be seen over time.
  • So, a further investigations were undertaken looking into three of the previous observations.

Why I did It

This project was undertaken based on an inspiration from Hanna Yan Han, it utilised the local database I’d created, and it aligned with my life long interest in Lego.

What I learnt

From focusing on the colour of Lego Bricks:

  • There have been a total of 273 colours between 1949 and 2025.
  • There is a void in colours with a purple/pink and green Hue, with high Saturation.
  • The number of colours steadily increased from 6 to 38 between 1950 to 1995.
  • Then between 1997 to 2006 the number of colours quickly increased to over 125.
  • This was followed in 2007 by an abrupt decrease to around 75 colours which is maintained until 2025.

Looking at sets between 1958 and 1967

  • The sets with the most parts were Deluxe Building Set and Starter Train Set with Motor.
  • There were 2 sets in database that seem to be place holders for unused parts (Unused Modulex parts… & Unused Parts Database Set…).
  • It can also be seen that themes in this period sometimes related to manufacture i.e Samsonite.

The Lime bricks from the 80’s that stood out

  • The lime coloured bricks were called Fabuland Lime and featured in 5 sets.
  • Parts included a watering can, bathtub and a toilet.

Data trends in teh 2000’s

  • Generally, there is an increase in themes, sets and colours starting in the mid-1900’s.
  • There is also a double peak in the number of new sets in 2000 and 2015, with a dip in number of sets in 2005.
  • Based on common knowledge about Lego business struggles at this time and their business recovery, some correlations were proposed, but without further data could not be verified.
# Import Library
library(RMySQL)
library(plotwidgets)
library(ggplot2)
library(reshape2)
# library(extrafont)

# DB connection
# source("/Users/steveuser/Documents/Repositories/Lego_data/getDBConn.r")  # Mac path
source("C:/Users/SteveLocal/Documents/Repos/Lego_data/getDBConn.r")  # Windows path

# Custom plot theme
custom_theme <- function() {
  font <- "mono"
  theme_light() +
    theme(
      legend.position = "none",
      plot.title = element_text(family = font,
                                size = 16,
                                face = "bold",
                                hjust = 0,
                                vjust = 2),
      plot.subtitle = element_text(family = font,
                                   size = 12),
      plot.caption = element_text(family = font,
                                  size = 9,
                                  hjust = 1),
      axis.title = element_text(family = font,
                                size = 10),
      axis.text = element_text(family = font,
                               size = 9),
      axis.text.x = element_text(margin = margin(5, b = 10))
    )
}
theme_set(custom_theme())

The above code chunk has default settings

Data processing

The data for all the colour analysis can be found in the ‘colors’ table in the Lego DB. For plotting: The individual colours values for both RGB and HSL were pull out into their own columns. Also, a mapping list was created to map lego brick colour to lego brick colour name (makes life easy when manually setting the colour scale scale_color_manual()).

# Note: 'conn' was generated from a script to hide the user details
#       dbGetQuery is from the 'RMySQL' library which is loaded in the above script

data_from_mysql <- dbGetQuery(conn, "SELECT * FROM colors")

# Split colour values out for plotting
data_from_mysql$rgb_hex <- paste0("#", data_from_mysql$rgb)
# HSL values
data_from_mysql$H <- col2hsl(data_from_mysql$rgb_hex)[1,]
data_from_mysql$S <- col2hsl(data_from_mysql$rgb_hex)[2,]
data_from_mysql$L <- col2hsl(data_from_mysql$rgb_hex)[3,]
# RGB values
data_from_mysql$R <- col2rgb(data_from_mysql$rgb_hex)[1,]
data_from_mysql$G <- col2rgb(data_from_mysql$rgb_hex)[2,]
data_from_mysql$B <- col2rgb(data_from_mysql$rgb_hex)[3,]

# Mapping Lego Brick colour to name for manual scale plotting
fill_colour_value_mapping_byName <- list()
for(i in 1:nrow(data_from_mysql)){
  fill_colour_value_mapping_byName[i] <- data_from_mysql$rgb_hex[i]
}
names(fill_colour_value_mapping_byName) <- data_from_mysql$name

# Addition non-lego brick colours for ease of plotting
fill_colour_value_mapping_byName <- append(fill_colour_value_mapping_byName, c(None = NA))

Exploration of the data

The following explores Lego brick colours as well as the addition data incorporated in the ‘colors’ database table, which includes:

  • number of parts (num_parts)
  • number of lego sets (num_sets)
  • first year introduced (y1)
  • last year used (y2)

All the lego brick colours

The following 2 charts show all the Lego brick colours from 1949 to 2015 (when the data was downloaded) based on Hue, Saturation and Lightness (HSL).

The final chart shows the colours that were used per year between 1949 and 2015.

# Saturation vs Lightness, Fill <- Hue
ggplot() +
  geom_point(data = data_from_mysql,
             mapping = aes(x = L, y = S, fill = name),
             shape = 21,
             stroke = 0.5,
             size = 5,
             colour = "black",
             alpha = 1.0) +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  scale_x_continuous(breaks = c(0, 1)) +
  scale_y_continuous(breaks = c(0, 1)) +
  labs(title = "All Lego brick colours: Lightness vs Saturation",
       subtitle = "273 different colours from 1949 to 2025",
       x = "Lightness",
       y = "Saturation") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())
All the Lego brick colours. Saturation vs Lightness, fill is brick colour (not hue).

All the Lego brick colours. Saturation vs Lightness, fill is brick colour (not hue).

The above plot shows all the different coloured Lego bricks between 1949 and 2015. This indicates a higher density of colours with high (1) saturation.

# Radial: Hue vs Saturation, fill <- Hue
ggplot(data = data_from_mysql,
       mapping = aes(x = H, y = S, fill = name)) +
  geom_point(shape = 21,
             stroke = 0.5,
             size = 5,
             colour = "black",
             alpha = 1.0) +
  coord_polar(theta = "x") +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  scale_x_continuous(breaks = c(0, 360)) +
  scale_y_continuous(breaks = seq(0, 1, 0.2)) +
  labs(title = "Lego brick colour: Saturation vs Hue",
       subtitle = "Red areas highlight sparsely populated brick colours",
       x = "Hue",
       y = "Saturation") +
  geom_rect(aes(xmin = 85, xmax = 180,
                ymin = 0.65, ymax = 1,
                color = "black", fill = "None", alpha = 0.5)) +
  geom_rect(aes(xmin = 235, xmax = 335,
                ymin = 0.5, ymax = 1,
                color = "black", fill = "None", alpha = 0.5))
All the Lego brick colours. Saturation vs Hue, fill is brick colour (not hue).

All the Lego brick colours. Saturation vs Hue, fill is brick colour (not hue).

When you look at the same colours with Hue on a polar axis there seems to be 2 areas that have a lower density of brick colours. These are highlighted by the read rectangles. This implies there are low numbers of bricks in the purple/pink and green colours with saturation above 0.5.

# Dataframe to capture each year a colour was used
data_colour_by_year <- data.frame(
  year = seq(min(data_from_mysql$y1, na.rm = TRUE),
             max(data_from_mysql$y2, na.rm = TRUE),
             1))

for (i in 1:nrow(data_from_mysql[order(data_from_mysql$S), ])){
  name_label = data_from_mysql$name[i]
  data_colour_by_year[[name_label]] <- as.numeric(
    data_colour_by_year$year >= data_from_mysql$y1[i] &
    data_colour_by_year$year <=  data_from_mysql$y2[i])
}

data_colour_by_year <- melt(data = data_colour_by_year,
                            id.vars = "year",
                            variable.name = "name",
                            value.name = "present")
data_colour_by_year$present[is.na(data_colour_by_year$present)] <- 0

data_colour_by_year$H <- NA
for (i in 1:nrow(data_from_mysql)){
  data_colour_by_year$H[which(data_from_mysql$name[i] == data_colour_by_year$name)] <- data_from_mysql$H[i]
}

ggplot(data = data_colour_by_year,
       mapping = aes(x = year, y = present, fill = reorder(name, H))) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  labs(title = "Lego brick colour: Colour count by year",
       subtitle = "",
       x = "Year",
       y = "Colour count")
Number of Lego brick colours used per year. Fill is brick colour

Number of Lego brick colours used per year. Fill is brick colour

The above plot shows the number of colours used per year, it can be observed from this that:

  • There is a general increase in the number of colours overtime.
  • Between 1958 to 1967 the number of colours is higher than the trend but constant (median 18).
  • There is a significant growth in the number of colours between 1997 and 2006.
    • With the number of colours peaking over 125.
  • In 2007 there was an abrupt decrease to a consistent number (~75) of colours to the present.

What I want to know…

The above leaves me with the following questions:

  • What were the lego sets between 1958 and 1967 that caused the ~18 colour plateau?
  • What lego sets used the lime coloured bricks for 5 years in the early 80’s?
    • Where they all the same brick?
  • How do the lego themes and sets vary (if at all) with the exponential growth in colours in the early 2000’s?

Further investigation

The following section explores the aforementioned questions.

Lego sets between 1958 and 1967

Previous analysis showed that between 1958 and 1967 there was a plateau in the number of colours used in Lego sets (median 18 colours). The following section explores what lego sets were available in this time period and what colours made up the bricks in these sets.

The following 2 plots show the themes available between 1958 and 1967, detailing the number of bricks by colour in each theme, as well as the number of sets per theme.

data_sixty_brickCol <- dbGetQuery(conn,
                                  "SELECT themes.name as 'themes_name',
                                          colors.name as 'colour_name',
                                          COUNT(parts.part_num) as 'part_count'
                                    FROM sets
                                      INNER JOIN themes ON sets.theme_id = themes.id
                                      INNER JOIN inventories ON sets.set_num = inventories.set_num
                                      INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
                                      INNER JOIN colors ON inventory_parts.color_id = colors.id
                                      INNER JOIN parts ON inventory_parts.part_num = parts.part_num
                                  WHERE (sets.year >= 1958 AND sets.year <=1967)
                                  GROUP BY themes_name, colour_name
                                  ORDER BY themes_name;")

data_sixty_setTheme <- dbGetQuery(conn,
                                  "SELECT themes.name as 'themes_name',
                                          COUNT(sets.set_num) as 'sets_count'
                                    FROM sets
                                      INNER JOIN themes ON sets.theme_id = themes.id
                                  WHERE (sets.year >= 1958 AND sets.year <=1967)
                                  GROUP BY themes_name
                                  ORDER BY themes_name;")

data_sixty_setTheme$sets_label <- NA
for (i in 1:nrow(data_sixty_setTheme)) {
  data_sixty_setTheme$sets_label[i] <- paste(data_sixty_setTheme$sets_count[i], " sets")
}

# head(data_sixty_brickCol, n = 10)
# head(data_sixty_setTheme, n = 10)

# 
ggplot(data = data_sixty_brickCol,
       mapping = aes(x = themes_name, y = part_count, fill = colour_name)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  labs(title = "Lego brick colour: 1958 to 1967, themes",
       subtitle = "Number of bricks by colour in each theme",
       x = "Theme name",
       y = "Brick count") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Number of bricks in themes from 1958 to 1967. Fill is brick colour.

Number of bricks in themes from 1958 to 1967. Fill is brick colour.

ggplot(data = data_sixty_brickCol,
       mapping = aes(x = 5, y = part_count, fill = colour_name)) +
  geom_bar(stat = "identity", position = "fill", colour = "black", linewidth = 0.1) +
  geom_text(data = data_sixty_setTheme, mapping = aes(x = -Inf, y = -Inf, fill = NA, label = sets_label)) +
  coord_polar(theta = "y") +
  scale_x_continuous(limits = c(3, NA)) +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  facet_wrap(~ themes_name) +
  labs(title = "Lego brick colour: 1958 to 1967. themes",
       subtitle = "The number of sets within a theme are labeled for each plot",
       x = "Theme name",
       y = "Colour of bricks (% of brick in theme)") +
  theme(axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())
Number of bricks in themes from 1958 to 1967. Fill is brick colour, number of sets per theme labelled in centre of each plot.

Number of bricks in themes from 1958 to 1967. Fill is brick colour, number of sets per theme labelled in centre of each plot.

It can be seen in the above that most the themes have the same colours with the exception of:

  • Database Sets
  • HO 1:87 Vehicles
  • Modulex

Also, themes like Database Sets & Samsonite in this time period are less related to the content of the sets, but logistical categorisation i.e. according to brickipedia Samsonite was US manufacturer of Lego.

data_sixty_setsParts <- dbGetQuery(conn,
                                    "SELECT sets.name as 'sets_name',
                                          COUNT(parts.part_num) as 'part_count'
                                    FROM sets
                                      INNER JOIN themes ON sets.theme_id = themes.id
                                      INNER JOIN inventories ON sets.set_num = inventories.set_num
                                      INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
                                      INNER JOIN colors ON inventory_parts.color_id = colors.id
                                      INNER JOIN parts ON inventory_parts.part_num = parts.part_num
                                  WHERE (sets.year >= 1958 AND sets.year <=1967)
                                  GROUP BY sets_name
                                  ORDER BY part_count DESC;")

data_sixty_setsParts_col <- dbGetQuery(conn,
                                        "SELECT sets.name as 'sets_name',
                                                colors.name as 'colour_name',
                                                COUNT(parts.part_num) as 'part_count'
                                          FROM sets
                                            INNER JOIN themes ON sets.theme_id = themes.id
                                            INNER JOIN inventories ON sets.set_num = inventories.set_num
                                            INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
                                            INNER JOIN colors ON inventory_parts.color_id = colors.id
                                            INNER JOIN parts ON inventory_parts.part_num = parts.part_num
                                        WHERE (sets.year >= 1958 AND sets.year <=1967)
                                        GROUP BY sets_name, colour_name
                                        ORDER BY part_count DESC;")

data_sixty_setsParts$part_label <- NA
for (i in 1:nrow(data_sixty_setsParts)) {
  data_sixty_setsParts$part_label[i] <- paste(data_sixty_setsParts$part_count[i], " parts")
}

# head(data_sixty_setsParts, n = 14)
# head(data_sixty_setsParts_col, n = 10)

# Select top sets
top_sets_index <- logical(nrow(data_sixty_setsParts_col))
for (i in 1:16) {
  top_sets_index <- top_sets_index | (data_sixty_setsParts_col$sets_name == data_sixty_setsParts$sets_name[i])
}

ggplot(data = data_sixty_setsParts_col[which(top_sets_index), ],
       mapping = aes(x = 5, y = part_count, fill = colour_name)) +
  geom_bar(stat = "identity", position = "fill", colour = "black", linewidth = 0.1) +
  geom_text(data = data_sixty_setsParts[1:16, ], mapping = aes(x = -Inf, y = -Inf, fill = NA, label = part_label), size = 3) +
  coord_polar(theta = "y") +
  scale_x_continuous(limits = c(3, NA)) +
  scale_fill_manual(values = fill_colour_value_mapping_byName) +
  facet_wrap(~ sets_name) +
  labs(title = "Lego brick colour: 1958 to 1967, sets",
       subtitle = "The number of parts within a theme are labeled for each plot",
       x = "Top 16 Lego sets by number of parts",
       y = "Colour of bricks (% of brick in theme)") +
  theme(axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(size = 6))
% of bricks by colour in theme from 1958 to 1967. The number of parts per set is labelled in the centre of each plot.

% of bricks by colour in theme from 1958 to 1967. The number of parts per set is labelled in the centre of each plot.

Addressing the original question of what makes up the bricks between 1958 and 1967, the majority of parts are in an ‘unused’ category (Unused Modulex parts sold by LEGO & Unused Parts Database Set - Pre-1965, with 1863 & 200 parts respectively). It is assumed that these parts were not commercially available, but were held in the database for completeness. So removing these 2 sets from the analysis, the top 3 sets with the most parts are:

  • Deluxe Building Set with 144 parts
  • Starter Train Set with Motor with 143 parts
  • Basic Building Set with Train with 116 parts

Lime coloured bricks in the early 80’s

With a quick enquiry (see below) is can be seen that there were 2 colours that were introduced and ended in the 80’s:

  • Fabuland Orange
  • Fabuland Lime
# Find colours only in the 80's
head(data_from_mysql$name[data_from_mysql$y1 > 1980 &
                       data_from_mysql$y2 < 1990 &
                       !is.na(data_from_mysql$y1)], n = 10)
## [1] "Fabuland Orange" "Fabuland Lime"

Filtering on ‘Fabulous Lime’ shows 15 parts:

data_lime_bricks <- dbGetQuery(conn, 
                                "SELECT sets.set_num as 'sets_num', 
                                          sets.name as 'sets_name', 
                                          colors.name as 'color_name',
                                          parts.name as 'part_name'
                                    FROM sets
                                      INNER JOIN themes ON sets.theme_id = themes.id
                                      INNER JOIN inventories ON sets.set_num = inventories.set_num
                                      INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
                                      INNER JOIN colors ON inventory_parts.color_id = colors.id
                                      INNER JOIN parts ON inventory_parts.part_num = parts.part_num
                                  WHERE colors.name = 'Fabuland Lime';")

# head(data_lime_bricks, n = 10)
data_lime_bricks
##    sets_num         sets_name    color_name                           part_name
## 1    3715-1      Flower Stand Fabuland Lime    Fabuland, Equipment Watering Can
## 2    3781-1 Maximillian Mouse Fabuland Lime    Fabuland, Equipment Watering Can
## 3    3707-1        Clover Cow Fabuland Lime    Fabuland, Equipment Watering Can
## 4    2770-1        Play House Fabuland Lime                 Duplo Bathroom Sink
## 5    2770-1        Play House Fabuland Lime                       Duplo Bathtub
## 6    2770-1        Play House Fabuland Lime          Duplo Shower Head on Stand
## 7    2770-1        Play House Fabuland Lime      Duplo Mirror with Silver Print
## 8    2770-1        Play House Fabuland Lime          Duplo Toilet (without Rim)
## 9    2754-1          Bathroom Fabuland Lime Duplo Chair 2 x 2 x 2 with One Stud
## 10   2754-1          Bathroom Fabuland Lime           Duplo Cabinet 2 x 2 x 1.5
## 11   2754-1          Bathroom Fabuland Lime                 Duplo Bathroom Sink
## 12   2754-1          Bathroom Fabuland Lime                       Duplo Bathtub
## 13   2754-1          Bathroom Fabuland Lime          Duplo Shower Head on Stand
## 14   2754-1          Bathroom Fabuland Lime      Duplo Mirror with Silver Print
## 15   2754-1          Bathroom Fabuland Lime          Duplo Toilet (without Rim)

So, to answer the proposed question: In the above it can be seen that the ‘Fabuland Lime’ was used to make 8 different bricks. 7 bricks made up parts of a duplo bathroom from 2 sets and the final brick was a watering can used across 3 sets.

Image of Lego set with Fabuland Lime bricks.